Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator #36

Closed
wants to merge 1,183 commits into from

Conversation

bigdata-memory
Copy link

…note that move allocator services to the service-dist folder as the properties indicated in pom.xml.

@@ -57,6 +59,17 @@ public PooledByteBufAllocatorL(MetricRegistry registry) {
empty = new UnsafeDirectLittleEndian(new DuplicatedByteBuf(Unpooled.EMPTY_BUFFER));
}

public static void setUpMnemonicUnpooledByteBufAllocator(MnemonicUnpooledByteBufAllocator<?> mubballocator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of statically setting an allocator, it should be an optional constructor parameter of type ByteBufAllocator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, let me try to make it optional, Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found the field INNER_ALLOCATOR of AllocationManager is initialized with "new PooledByteBufAllocatorL()" in
https://github.com/apache/arrow/blob/master/java/memory/src/main/java/org/apache/arrow/memory/AllocationManager.java#L68
so should I add an optional constructor for AllocationManager as well?

The bufferWithoutReservation(..) of BaseAllocator, in turn, instantiates and use the AllocationManager,

final AllocationManager manager = new AllocationManager(this, size);

so is the method the best place to inject Mnemonic's allocator as its parameter?
Thanks!

kou and others added 29 commits August 16, 2017 09:17
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#968 from kou/add-new-committers and squashes the following commits:

710558b [Kouhei Sutou] [Website] Add new committers
…release blog post

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#967 from wesm/ARROW-1353 and squashes the following commits:

804fe35 [Wes McKinney] Escape underscores in CHANGELOG.md
1b7c4b6 [Wes McKinney] Finish 0.6.0 blog post
a78cb94 [Wes McKinney] Some updates for 0.6.0 site update
Closes apache#970

Change-Id: I49ea3f7f99d080c517fb21b86b7a27e17b04e20b
…rite_table

Closes apache#971

Change-Id: I7c689b200a4f04af51928f6765362fef52c613e8
…mple. Update API doc site build instructions

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#973 from wesm/site-doc-updates and squashes the following commits:

8884b4a4 [Wes McKinney] Remove outdated pyarrow.jemalloc_memory_pool example. Add --with-plasma to Python doc build
Make Arrow buildable with jdk9:
- upgrade checkstyle plugin to 6.19
- upgrade assembly plugin to 3.0.0
- update jmockit version to 1.33

Also add travis entry to build using Oracle JDK9 EA

Author: Laurent Goujon <laurent@dremio.com>

Closes apache#966 from laurentgo/laurent/jdk-9 and squashes the following commits:

d009d01 [Laurent Goujon] Make mvn site optional since not working yet with jdk9
b3e5822 [Laurent Goujon] Update plugin version according to Maven team recommendations
d62d409 [Laurent Goujon] Fix travis id for jdk9
92fe6d4 [Laurent Goujon] Make Arrow buildable with jdk9
cc @jacques-n , @StevenMPhillips

Patch Summary:

As part of ARROW-801, we recently added getValidityBufferAddress(), getOffsetBufferAddress(), getDataBufferAddress() interfaces to get the virtual address of the ArrowBuf.

We now have the following new interfaces to get the corresponding ArrowBuf:

getValidityBuffer()
getDataBuffer()
getOffsetBuffer()

Background:

Currently we have getBuffer() method implemented as part of BaseDataValueVector abstract class. As part of patch for ARROW-276, NullableValueVectors no longer extends BaseDataValueVector -- they don't have to since they don't need the underlying data buffer  (ArrowBuf data field) of BaseDataValueVector.

The call to getBuffer() on NullableValueVectors simply delegates the operation to getBuffer() of underlying data/value vector.

Problem:

If a piece of code is working with ValueVector abstraction and the expected runtime type is Nullable<something>Vector, the compiler obviously complains about doing
(v of type ValueVector).getBuffer().

Until now this worked as we kept the compiler happy by casting the ValueVector to BaseDataValueVector and then do ((BaseDataValueVector)(v of type ValueVector)).getBuffer(). This code broke since NullableValueVectors are no longer a subtype of BaseDataValueVector -- the inheritance hierarchy was changed as part of ARROW-276.

Solution:

Similar to what was done in ARROW-801, we have new methods at ValueVector interface to get the underlying buffer. ValueVector has always had the methods getBuffers(), getBufferSizeFor(), getBufferSize(), so it makes sense to augment the ValueVector interface with new APIs.

It looks like new unit tests are not needed since the unit tests added for ARROW-801 test the new APIs as well --> getDataBufferAddress() underneath invokes getDataBuffer() to get the memory address of ArrowBuf so we are good.

Author: siddharth <siddharth@dremio.com>

Closes apache#976 from siddharthteotia/ARROW-1373 and squashes the following commits:

1ef2022 [siddharth] Fixed failures and added javadocs
e5ff023 [siddharth] ARROW-1373: Implement getBuffer() methods for ValueVector
Closes apache#977

Change-Id: I494db4952036a8e52078f1d698d003904f91a34f
The method for starting the Plasma store is already documented in https://arrow.apache.org/docs/python/plasma.html. So far it only worked if the store was installed with "make install" from the C++ sources. This makes it also possible to start it if the pyarrow wheels are installed.

Author: Philipp Moritz <pcmoritz@gmail.com>

Closes apache#975 from pcmoritz/plasma-store-ep and squashes the following commits:

eddc487 [Philipp Moritz] make plasma store entry point private
4c05140 [Philipp Moritz] define entry point for the plasma store
…a put performance

This PR makes it possible to use Plasma object store backed by a pre-mounted hugetlbfs.

Author: Philipp Moritz <pcmoritz@gmail.com>
Author: Alexey Tumanov <atumanov@gmail.com>

Closes apache#974 from atumanov/putperf and squashes the following commits:

077b78f [Philipp Moritz] add more comments
5aa4b0d [Philipp Moritz] preflight script formatting changes
22188a6 [Philipp Moritz] formatting
ffb9916 [Philipp Moritz] address comments
225429b [Philipp Moritz] update documentation with Alexey's fix
713a0c4 [Philipp Moritz] add missing includes
4c976bb [Philipp Moritz] make format
fb8e1b4 [Philipp Moritz] add helpful error message
7260d59 [Philipp Moritz] expose number of threads to python and try out cleanups
98b603e [Alexey Tumanov] map_populate on linux; fall back to mlock/memset otherwise
ce90ef4 [Alexey Tumanov] documenting new plasma store info fields
c52f211 [Philipp Moritz] cleanups (TODO: See if memory locking helps)
4702703 [Philipp Moritz] preliminary documentation
3073a99 [Alexey Tumanov] reenable hashing
a20ca56 [Alexey Tumanov] fix bug
dd04b87 [Alexey Tumanov] [arrow][putperf] enable HUGETLBFS support on linux
…he Arrow

This PR adds the capability to serialize a large class of (nested) Python objects in Apache Arrow. The eventual goal is to evolve this into a more modern version of pickle that will make it possible to read the data from other languages supported by Apache Arrow (and might also be faster).

Currently we support lists, tuples, dicts, strings, numpy objects, Python classes and namedtuples. A fallback to (cloud-)pickle can be provided for objects that cannot be natively represented in Arrow (for example lambdas).

Numpy data within objects is efficiently represented using Arrow's Tensor facilities and for the nested Python sequences we use Arrow's UnionArray.

There are many loose ends that will need to be addressed in follow up PRs.

Author: Philipp Moritz <pcmoritz@gmail.com>
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#965 from pcmoritz/python-serialization and squashes the following commits:

31486ed [Wes McKinney] Fix typo
2164db7 [Wes McKinney] Add SerializedPyObject to public API
b70235c [Wes McKinney] Add pyarrow.deserialize convenience method
a6a402e [Wes McKinney] Memory map fixture robustness on Windows
114a5fb [Wes McKinney] Add a Python container for the SerializedPyObject data, total_bytes method
8e59617 [Wes McKinney] Use pytest tmpdir for large memory map fixture so works on Windows
8a42f30 [Wes McKinney] Add doxygen comment to set_serialization_callbacks
a9522c5 [Wes McKinney] Refactoring, address code review comments. fix flake8 issues
ce5784d [Wes McKinney] Do not use ARROW_CHECK in production code. Consolidate python_to_arrow code
c8efef9 [Wes McKinney] Fix various Clang compiler warnings due to integer conversions. clang-format
831e2f2 [Philipp Moritz] remove sequence.h
54af39b [Philipp Moritz] more fixes
a6fdb76 [Philipp Moritz] make tests work
fe56c73 [Philipp Moritz] fixes
84d62f6 [Philipp Moritz] more fixes
49aba8a [Philipp Moritz] make it compile on windows
aa1f300 [Philipp Moritz] linting
95cb9da [Philipp Moritz] fix GIL
adcc8f7 [Philipp Moritz] shuffle stuff around
bcebdfe [Philipp Moritz] fix longlong vs int64 and unsigned variant
4cc45cd [Philipp Moritz] cleanup
f25f3f3 [Philipp Moritz] cleanups
a88d410 [Philipp Moritz] convert DESERIALIZE_SEQUENCE back to a macro
c425978 [Philipp Moritz] prevent possible memory leaks
aeafd82 [Philipp Moritz] fix callbacks
389bfc6 [Philipp Moritz] documentation
2f0760c [Philipp Moritz] fix api
faf9a3e [Philipp Moritz] make exported API more consistent
e1fc0c5 [Philipp Moritz] restructure
c1f377b [Philipp Moritz] more fixes
3e94e6d [Philipp Moritz] clang-format
99e2d1a [Philipp Moritz] cleanups
3298329 [Philipp Moritz] mutable refs and small fixes
e73c1ea [Philipp Moritz] make DictBuilder private
3929273 [Philipp Moritz] increase Py_True refcount and hide helper methods
aaf6f09 [Philipp Moritz] remove code duplication
c38c58d [Philipp Moritz] get rid of leaks and clarify reference counting for dicts
74b9e46 [Philipp Moritz] convert DESERIALIZE_SEQUENCE to a template
080db03 [Philipp Moritz] fix first few comments
a6105d2 [Philipp Moritz] lint fix
802e739 [Philipp Moritz] clang-format
2e08de4 [Philipp Moritz] fix namespaces
91b57d5 [Philipp Moritz] fix linting
c4782ac [Philipp Moritz] fix
7069e20 [Philipp Moritz] fix imports
2171761 [Philipp Moritz] fix python unicode string
30bb960 [Philipp Moritz] rebase
f229d8d [Philipp Moritz] serialization of custom objects
8b2ffe6 [Philipp Moritz] working version
bd36c83 [Philipp Moritz] handle very long longs with custom serialization callback
49a4acb [Philipp Moritz] roundtrip working for the first time
44fb98b [Philipp Moritz] work in progress
3af1c67 [Philipp Moritz] deserialization path (need to figure out if base object and refcounting is handled correctly)
deb3b46 [Philipp Moritz] rename serialization entry point
5766b8c [Philipp Moritz] python to arrow serialization
… back to pandas form

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#979 from wesm/ARROW-1357 and squashes the following commits:

8318a12 [Wes McKinney] Use PyLong_FromLongLong so Windows is happy
18acdd9 [Wes McKinney] Account for chunked arrays when converting lists back to pandas form
Author: Max Risuhin <risuhin.max@gmail.com>

Closes apache#980 from MaxRis/ARROW-1375 and squashes the following commits:

f5e4156 [Max Risuhin] ARROW-1375: [C++] Remove dependency on msvc version for Snappy build
…UDA tests

This is an optional leaf library for users who want to use Arrow data on graphics cards. See parent JIRA ARROW-1055 for a roadmap for some basic GPU extensions

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#982 from wesm/arrow-gpu-lib and squashes the following commits:

f8c00eb [Wes McKinney] Remove cruft from CMakeLists.txt
e8f04a8 [Wes McKinney] Set up libarrow_gpu, add simple unit test that allocates memory on device

Change-Id: Ia1851ea6f30cb7cf3de422779d2d029e4ded672f
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#983 from wesm/ARROW-1395 and squashes the following commits:

c105a21 [Wes McKinney] Remove deprecated APIs from <= 0.4.0
…atch as an IPC message to a new buffer

There's also a little bit of API scrubbing as I went through this code.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#984 from wesm/ARROW-1384 and squashes the following commits:

a3996fe [Wes McKinney] Add DCHECK to catch unequal schemas
2952cfb [Wes McKinney] Add SerializeRecordBatch API, various API scrubbing, make some integer arguments const
This makes it easy to write from host to device and read from device to host. We also need a zero-copy device reader for IPC purposes (where we don't want to move any data to the host), can do that in a subsequent patch.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#985 from wesm/ARROW-1392 and squashes the following commits:

ae24cb5 [Wes McKinney] Add section to C++ README about building libarrow_gpu
229a268 [Wes McKinney] Refactor CudaBufferReader to return zero-copy device pointers. Add unit tests
415157a [Wes McKinney] Make Tell overrides in arrow-glib const
5daa59e [Wes McKinney] Add cuda-benchmark module
1cf1196 [Wes McKinney] Test CudaBuffer::CopyFromHost
a2708f2 [Wes McKinney] Implement IO interfaces for CUDA buffers
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#989 from wesm/ARROW-1386 and squashes the following commits:

be3b53a [Wes McKinney] Unpin CMake version in MSVC toolchain builds now that 3.9.1 is in conda-forge
…f sign bit

* Reimplement Decimal128 types to use the Int128 type as the underlying integer
representation, adapted from the Apache ORC project's C++ in memory format.
This enables us to write integration tests and results in an in-memory
Decimal128 format that is compatible with the Java implementation
* Additionaly, this PR also fixes Decimal slice comparison and adds related
regression tests
* Follow-ups include ARROW-695 (C++ Decimal integration tests), ARROW-696 (JSON
read/write support for decimals) and ARROW-1238 (Java Decimal integration
tests).

Author: Phillip Cloud <cpcloud@gmail.com>

Closes apache#981 from cpcloud/decimal-rewrite and squashes the following commits:

53ce04b [Phillip Cloud] Formatting
fe13ef3 [Phillip Cloud] Remove redundant constructor
86db184 [Phillip Cloud] Subclass from FixedSizeBinaryArray for code reuse
535f9ff [Phillip Cloud] Use a macro for cases
1cc43ce [Phillip Cloud] Use CHAR_BIT
355fb24 [Phillip Cloud] Include the correct header for _BitScanReverse
b53d7cd [Phillip Cloud] Share comparison code
162eeeb [Phillip Cloud] BUG: Double export
b98c894 [Phillip Cloud] BUG: Export symbols
be220c8 [Phillip Cloud] Cast so we have enough space to contain the integer
5716010 [Phillip Cloud] Cast 18 to matching type size_t for msvc
8833904 [Phillip Cloud] Remove unnecessary args to sto* function calls
628ce85 [Phillip Cloud] Fix more docs
e4a1792 [Phillip Cloud] More const
8ecb315 [Phillip Cloud] Formatting
178d3f2 [Phillip Cloud] NOLINT for MSVC specific and necessary types
38c9b50 [Phillip Cloud] Fix doc style in int128.h and add const where possible
2930d7b [Phillip Cloud] Fix naming convention in decimal-test.cc
1eab5c4 [Phillip Cloud] Remove unnecessary header from CMakeLists.txt
22eda4b [Phillip Cloud] kMaximumPrecision
9af97d8 [Phillip Cloud] MSVC fix
349dc58 [Phillip Cloud] ARROW-786: [Format] In-memory format for 128-bit Decimals, handling of sign bit
…chema, ReadSchema public APIs

This is mostly moving code around. In reviewing I recommend focusing on the public headers. There were a number of places where it is more consistent to use naked pointers versus shared_ptr. Also some constructors were returning shared_ptr to subclass, where it would be simpler for clients to return a pointer to base.

This includes ARROW-1376 and ARROW-1406

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#988 from wesm/ARROW-1408 and squashes the following commits:

b156767 [Wes McKinney] Fix up glib bindings, undeprecate some APIs
4bdebfa [Wes McKinney] Add serialize methods to RecordBatch, Schema. Test round trip
ef12e0f [Wes McKinney] Fix a valgrind warning
73d30c9 [Wes McKinney] Better comments
8597b96 [Wes McKinney] Remove API that was never intended to be public, unlikely to be used anywhere
122a759 [Wes McKinney] Refactoring sweep and cleanup of public IPC API. Move non-public APIs from metadata.h to metadata-internal.h and create message.h, dictionary.h
b646f96 [Wes McKinney] Set device in more places
…ore.

cc @pcmoritz @atumanov

Author: Robert Nishihara <robertnishihara@gmail.com>

Closes apache#992 from robertnishihara/removemappopulate and squashes the following commits:

8ed9612 [Robert Nishihara] Remove unnecessary ifdef.
7b75bd9 [Robert Nishihara] Remove MAP_POPULATE flag when mmapping files in Plasma store.
…rted dependency errors

There is couple of dependency issues in the current maven config. This is then leaking into the integrating project which then needs to specify foreign dependencies just because arrow doesn't list them properly or is pulling unnecessary dependencies just because arrow lists them improperly.

* ```arrow-format```
```
[WARNING] Unused declared dependencies found:
[WARNING]    org.slf4j:slf4j-api:jar:1.7.25:compile
[WARNING]    com.vlkan:flatbuffers:jar:1.2.0-3f79e055:compile
[WARNING]    io.netty:netty-handler:jar:4.0.49.Final:compile
[WARNING]    com.google.guava:guava:jar:18.0:compile
```
* ```arrow-memory```
```
[WARNING] Used undeclared dependencies found:
[WARNING]    io.netty:netty-buffer:jar:4.0.49.Final:compile
[WARNING]    io.netty:netty-common:jar:4.0.49.Final:compile
[WARNING] Unused declared dependencies found:
[WARNING]    com.carrotsearch:hppc:jar:0.7.2:compile
[WARNING]    io.netty:netty-handler:jar:4.0.49.Final:compile
```
* ```arrow-tools```
```
[WARNING] Used undeclared dependencies found:
[WARNING]    com.fasterxml.jackson.core:jackson-databind:jar:2.7.9:compile
[WARNING]    com.fasterxml.jackson.core:jackson-core:jar:2.7.9:compile
[WARNING] Unused declared dependencies found:
[WARNING]    org.apache.commons:commons-lang3:jar:3.6:compile
[WARNING]    org.apache.arrow:arrow-format:jar:0.7.0-SNAPSHOT:compile
[WARNING]    io.netty:netty-handler:jar:4.0.49.Final:compile
```
* ```arrow-vector```
```
[WARNING] Used undeclared dependencies found:
[WARNING]    com.google.code.findbugs:jsr305:jar:3.0.2:compile
[WARNING]    com.vlkan:flatbuffers:jar:1.2.0-3f79e055:compile
[WARNING]    io.netty:netty-common:jar:4.0.49.Final:compile
[WARNING]    io.netty:netty-buffer:jar:4.0.49.Final:compile
[WARNING]    com.fasterxml.jackson.core:jackson-core:jar:2.7.9:compile
[WARNING] Unused declared dependencies found:
[WARNING]    org.apache.commons:commons-lang3:jar:3.6:compile
[WARNING]    io.netty:netty-handler:jar:4.0.49.Final:compile
```

I am proposing this PR to:
1. Add maven-dependency-plugin to enforce all dependencies are always listed corrctly
2. Fixing all the current dependency issues

Author: Antony Mayi <antonymayi@yahoo.com>
Author: Stepan Kadlec <stepan.kadlec@oracle.com>

Closes apache#978 from antonymayi/master and squashes the following commits:

d7f081e [Antony Mayi] moving `copy-flatc` to initialize phase and `analyze` execution to parent pom
ec72717 [Antony Mayi] removing unused apache.commons.lang3, fixing pom
8cbfe5f [Antony Mayi] maven-dependency-plugin: ignoring dependencies of generated sources in arrow-vector
dc833bb [Stepan Kadlec] adding maven-dependency-plugin and fixing all reported dependency errors
Author: Phillip Cloud <cpcloud@gmail.com>

Closes apache#993 from cpcloud/ARROW-1411 and squashes the following commits:

741269f [Phillip Cloud] ARROW-1411: [Python] Booleans in Float Columns cause Segfault
When configured, this looks like:

```
#define ARROW_CUDA_ABI_VERSION_MAJOR 8
#define ARROW_CUDA_ABI_VERSION_MINOR 0
```

I'm not sure how to use this yet. It would be nice if we could work out how to enable thirdparty users to detect incompatibility with their nvcc at compiler time

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#990 from wesm/ARROW-1399 and squashes the following commits:

1ad6966 [Wes McKinney] Add CUDA build version defines in public headers
Apache Arrow C++ uses int as result type for expression that uses
size_t. It causes sign-conversion warning but the coding style is
expected.

Example:

    .../arrow/buffer.h:296:41: warning:
          implicit conversion changes signedness: 'unsigned long' to 'int64_t'
          (aka 'long') [-Wsign-conversion]
      int64_t length() const { return size_ / sizeof(T); }
                               ~~~~~~ ~~~~~~^~~~~~~~~~~

Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#999 from kou/glib-suppress-warning-on-clang and squashes the following commits:

397490e [Kouhei Sutou] [GLib] Suppress sign-conversion warnings
This PR slightly reduces ambiguity in the array example for null bitmaps. The original example was left/right symmetric; this PR changes the example to break that symmetry. Asymmetry is important since readers who skip the byte endianness section could have interpreted the bitmap buffer in two distinct ways: left-to-right with an offset of 3 (wrong), or right-to-left with zero offset (correct).

Author: Fritz Obermeyer <fritz.obermeyer@gmail.com>

Closes apache#998 from fritzo/patch-1 and squashes the following commits:

af3dcbd [Fritz Obermeyer] Clarify memory layout documentation
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#996 from kou/glib-cast-after-status-check and squashes the following commits:

02b59db [Kouhei Sutou] [GLib] Cast after status check
…hon objects

Note that this PR breaks the PlasmaClient API (which is still unstable at this point, so this is acceptable). It renames PlasmaClient.get to PlasmaClient.get_buffers and introduces two new functions, PlasmaClient.put and PlasmaClient.get which can put Python objects into the object store and provide access to their content. The old get was renamed to get_buffers because most users will want to use the new get method and therefore it should have the more concise name.

There is some freedom in designing the API; I tried to make it so there is a unified API between getting one and multiple objects (the latter is supported to limit the number of IPC roundtrips with the plasma store when we get many small objects). I also introduced a special object that is returned if one of the objects was not available within the timeout. We could use "None" here, but then it would be hard to distinguish between getting a "None" object and a timeout.

Author: Philipp Moritz <pcmoritz@gmail.com>

Closes apache#995 from pcmoritz/plasma-putget and squashes the following commits:

bd24e01 [Philipp Moritz] add documentation
e60ea73 [Philipp Moritz] get_buffer -> get_buffers and update example
8c36903 [Philipp Moritz] support full API
5921148 [Philipp Moritz] move put and get into PlasmaClient
cf4bf24 [Philipp Moritz] add type information
0049c67 [Philipp Moritz] fix flake8 linting
44c3b3d [Philipp Moritz] fixes
20b119e [Philipp Moritz] make it possible to get single objects
36f67d6 [Philipp Moritz] implement ObjectID.from_random
c044954 [Philipp Moritz] add documentation
eb9694a [Philipp Moritz] implement timeouts
3518c71 [Philipp Moritz] fix
e1924a4 [Philipp Moritz] add put and get
44ada47 [Philipp Moritz] export symbols
…o GPU device memory

This additionally does a few things:

* Change libarrow_gpu to use CUDA driver API instead of runtime API
* Adds code for exporting buffers using CUDA IPC on Linux, but this is not yet tested

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#1000 from wesm/ARROW-1364 and squashes the following commits:

e436755 [Wes McKinney] Add newline at end of file
a8812af [Wes McKinney] Complete basic IPC message and record batch reads on GPU device memory
16d628f [Wes McKinney] More Arrow IPC scaffolding
591aceb [Wes McKinney] Draft SerializeRecordBatch for CUDA
84e4525 [Wes McKinney] Add classes and methods for simplifying use of CUDA IPC machinery. No tests yet
508febb [Wes McKinney] Test suite passing again
f3c724e [Wes McKinney] Get things compiling / linking using driver API
5d686fe [Wes McKinney] More progress
2840c60 [Wes McKinney] Progress
3a37fdf [Wes McKinney] Start cuda context class
03d0baf [Wes McKinney] Start cuda_ipc file
Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#1238 from wesm/ARROW-1654 and squashes the following commits:

2e6f9e3 [Wes McKinney] Add pickling test cases for timestamp, decimal
1827b23 [Wes McKinney] Fix pickling on py27, implement for Schema. Also pickle field/schema metadata
1395583 [Wes McKinney] Implement pickling for list, struct, add __richcmp__ for Field
366f428 [Wes McKinney] Start implementing pickling for DataType, Field
@wesm
Copy link
Member

wesm commented Oct 23, 2017

It seems like this is still an interesting optional extension. @bigdata-memory are you interested in rebasing this and making this an optional extension (arrow-mnemonic)?

@bigdata-memory
Copy link
Author

@wesm sure, I will do it, Thanks.

Licht-T and others added 5 commits October 24, 2017 12:41
This closes [ARROW-1720](https://issues.apache.org/jira/browse/ARROW-1720).

Author: Licht-T <licht-t@outlook.jp>

Closes apache#1243 from Licht-T/fix-unbound-chunk and squashes the following commits:

cabdd43 [Licht-T] TST: Add bounds check tests for chunk getter
bda7f4c [Licht-T] BUG: Implement bounds check in chunk getter
This got messed up during one of the patches in which these files were refactored. Once the build fails, I will fix the lint errors

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#1242 from wesm/ARROW-1711 and squashes the following commits:

cd4b655 [Wes McKinney] Fix more flake8 warnings
2eb8bf4 [Wes McKinney] Fix flake8 issues
cef7a7c [Wes McKinney] Fix flake8 calls to lint the right directories
… to avoid using nullptr in public headers

cc @TobyShaw. Can you test this?

Close apache#1098

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#1228 from wesm/ARROW-1134 and squashes the following commits:

bf18158 [Wes McKinney] Only define NULLPTR if not already defined
a51dd88 [Wes McKinney] Add NULLPTR macro to avoid using nullptr in public headers for C++/CLI users
Author: Phillip Cloud <cpcloud@gmail.com>

Closes apache#1211 from cpcloud/ARROW-1588 and squashes the following commits:

ae0d562 [Phillip Cloud] ARROW-1588: [C++/Format] Harden Decimal Format
Author: Kouhei Sutou <kou@clear-code.com>

Closes apache#1247 from kou/c-glib-release-verify and squashes the following commits:

e9f2307 [Kouhei Sutou] [GLib] Add setup description to verify C GLib build
@bigdata-memory
Copy link
Author

Hi please take a look this PR, it can compile without tests but reports many following errors when running tests, please help, Thanks!
"java.lang.NoClassDefFoundError: Could not initialize class org.apache.arrow.memory.RootAllocator"

@bigdata-memory
Copy link
Author

Found the mnemonic-pmalloc-service must be separated from dependency.

…note that move allocator services to the service-dist folder as the properties indicated in pom.xml.
@bigdata-memory
Copy link
Author

Pass compile and tests, please review, Thanks!

@wesm wesm changed the title added Mnemonic infra. as an alternative backed allocation mechanism, … ARROW-1760: [Java] Add Apache Mnemonic (incubating) as alternative backed allocator Nov 1, 2017
@wesm
Copy link
Member

wesm commented Nov 1, 2017

@jacques-n @siddharthteotia could someone from the Java side take a look at this? So long as it does not conflict with normal users of Arrow, giving the option to experiment with non-volatile memory to users seems like a reasonable idea. I'm not personally qualified to review the Java code

@jacques-n
Copy link
Contributor

I think we should look at doing this in a cleaner way. Having setters on static interface seems like a bit of hack. I also think it probably makes sense to expose a location property (or similar) as well as an ability to move memory between domains. A good way might be to have an optional constructor for RootAllocator with a new interface. The default could have a wrapped version of the existing static pooled udle allocator.

The allocator capacity should also be tied to the subsystem. Right now I think we're constrained by directory memory capacity of the JVM but that may not be true in the case that we're using the other allocator.

Also, any idea on the performance using the alternative allocator. Does Mnemonic have it's own intelligent allocator? The normal path uses a nice allocator to manage various size chunks. The model presented here operates above that allocator (it seems like maybe it should be below the netty allocator and used for chunk allocations rather than final allocations) and thus I wonder how smaller allocations would work (I don't know Mnemonic well).

It seems like people should be able to inspect as well change the memory tier that a ArrowBuf can be located in. For example, move a buffer between memory, nvme and disk. Thoughts? In this case we need think more about how we manage allocation tracking. You'd potentially want have constraints and/or reservations per domain.

I also would prefer not making mnemonic a required dependency. Seems like we should look at how we can make it optional. If we do something more interface based at the RootAllocator level, this should be possible.

@bigdata-memory
Copy link
Author

The Mnemonic has three allocators i.e. VolatileMemAllocator, NonVolatileMemAllocator and SysMemAllocator, all of them rely on qualified memory services. those allocators abstract fundamental interface operations for DOM and DCM, Regarding how smaller allocations would work, that would totally depend on the implementation of specific memory service. the action e.g. move a buffer between memory domains might be handled by Mnemonic directly later or Arrow itself, I think this kind of action could be pretty straightforward because there may be no customizable links between ArrowBufs.

@bigdata-memory
Copy link
Author

Regarding optional dependency, I think we need to design a well-defined mechanism to make it possible. Mnemonic has provided one and will define a schema to make this more flexible.

@wesm
Copy link
Member

wesm commented Nov 4, 2018

Closing this as stale for now

@wesm wesm closed this Nov 4, 2018
jikunshang pushed a commit to jikunshang/arrow that referenced this pull request May 6, 2020
fix SendCreateRequest, miss a parameter
kou pushed a commit that referenced this pull request May 10, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). #7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test #20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test #21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test #22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test #23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test #24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test #25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test #26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test #27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test #28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test #29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test #30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test #31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test #32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test #33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test #34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test #35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test #36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test #37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test #38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test #39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test #40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test #41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test #42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test #43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test #44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test #45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test #46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test #47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test #48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test #49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test #50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test #51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes #7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
zhztheplayer pushed a commit to zhztheplayer/arrow-1 that referenced this pull request Nov 9, 2021
jayhomn-bitquill pushed a commit to Bit-Quill/arrow that referenced this pull request Aug 10, 2022
* Add toString to Time obj in Time#toString

* Improve Time toString

* Fix maven plugins

* Revert "Update java/flight/flight-jdbc-driver/src/test/java/org/apache/arrow/driver/jdbc/accessor/impl/calendar/ArrowFlightJdbcTimeStampVectorAccessorTest.java"

This reverts commit 00808c0.

* Revert "Merge pull request apache#29 from rafael-telles/Timestamp_fix"

This reverts commit 7924e7b, reversing
changes made to f6ac593.

* Fix DateTime for negative epoch

* Remove unwanted change

* Fix negative timestamp shift

* Fix coverage

* Refator DateTimeUtilsTest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet